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Brief  Research  Outline 


The  popularity  of  Distributed  Memory  Multiprocessors  (DMM)  has  been  on  the  rise  in  recent  years. 
The  increase  can  be  attributed  to  the  advances  in  VLSI  technology,  better  inter-processor  commu¬ 
nication  networks  and  more  efficient  routing  algorithms.  Some  of  the  applications  using  DMMs 
are  fluid  flow,  weather  modeling,  database  systems,  etc.  The  object-oriented  design  methodology 
is  being  widely  used  to  solve  these  practical  problems. 

To  obtain  the  maximum  benefits  of  DMMs.  an  efficient  task  partitioning  and  scheduling  strategy 
is  essential.  The  task  partitioning  algorithm  partitions  the  application  into  separate  tasks  by 
detecting  the  parallelism  in  the  program  and  represents  them  in  the  form  of  a  Directed  Acyclic 
Graph  (DAG).  After  the  application  is  transformed  to  a  DAG.  the  tasks  are  scheduled  onto  the 
processors.  This  work  deals  with  automating  the  process  of  parallelism  detection  to  obtain  better 
partitions  of  the  applications  and  then  scheduling  the  partitioned  tasks  to  achieve  reduced  overall 
execution  time. 

As  the  title  of  the  project  suggests,  there  are  two  main  objectives  of  this  research.  Those  two 
objectives  are  described  below  in  detail. 

1  Parallelism  Detection 

There  are  several  object-oriented  languages  but  they  fail  to  integrate  concurrency  and  object- 
orientedness.  Even  the  available  concurrent  object-oriented  languages  suffer  from  inheritance 
anomaly,  which  forces  noil-trivial  class  re-definitions  because  the  synchronization  code  cannot  lie 
effectively  inherited.  This  research  has  led  to  the  development  of  a  new  concurrent  object-oriented 
language,  CORE,  which  does  not  suffer  from  inheritance  anomaly  and  allows  a  programmer  to 
specify  parallel  tasks  at  the  inter-  and  intra-object  levels.  To  achieve  the  best  results  of  running 
any  program  on  a  parallel  machine,  the  compiler  needs  to  automatically  detect  the  inherent  par¬ 
allelism  in  the  program.  The  current  goals  of  this  research  are  to  investigate  the  techniques  for 
automatically  detecting  parallelism  in  object-oriented  programs  at  the  compile  time. 

A  concurrent  object-oriented  language  CORE  has  been  developed  which  minimizes  inheritance 
anomaly,  promotes  code  reuse  of  both  the  sequential  and  parallel  components,  and  offers  mech¬ 
anisms  for  controlling  computational  granularity.  The  methodology  presented  in  this  research  is 
extendible  to  any  class  based  language. 

2  Scheduling 

Efficient  use  of  resources  requires  proper  scheduling  algorithm.  There  can  be  several  goals  of  any 
scheduling  algorithm.  This  research  considers  reducing  the  overall  execution  time  of  any  program  as 
the  primary  goal.  A  Threshold  Scheduling  scheme  has  been  proposed  for  DMMs.  In  this  scheduling 
scheme  the  compile  time  schedule  is  found  using  a  new  concept  of  threshold  of  a  task  that  quantifies 
a  trade-off  between  the  schedule  length  and  the  degree  of  parallelism.  In  case  of  multiple  tasks 
competing  for  the  same  processor,  the  tie  is  broken  using  a.  merit  function  of  the  tasks  on  the 
processor  that  examines  a  match  between  a  task  and  a  processor.  The  threshold  is  varied  globally, 
and  the  best  value  to  satisfy  one  of  the  following  scheduling  goals  is  found: 

•  Compiling  for  minimizing  schedule  length:  Suitable  for  large  systems. 

•  Compiling  for  processor  requirements  below  a  given  maximum  number  of  available  processors 
in  the  system:  Suitable  for  small  systems. 


A  run-time  support  for  Sisal  compiler  has  been  developed  for  Intel  Gamma.  Delta  and  Paragon 
machines.  The  run-time  support  helped  correlate  the  compile  time  estimated  results  versus  the 
actual  run-time  results.  The  run-time  results  for  Sisal  compiler  using  threshold  scheduling  scheme 
were  comparable  to  the  compile-time-estimated  results.  The  speedups  obtained  on  all  three  ma¬ 
chines  were  lower  than  the  predicted  values.  This  is  primarily  due  to  the  run-time  overheads  and 
also  due  to  the  assumptions  made  in  modeling  communications.  Of  the  three  machines,  i.e.  Intel 
Delta,  Gamma  and  Paragon,  Paragon  was  closest  to  the  predicted  values  followed  bv  Delta  and 
Gamma.  Paragon  was  better  than  the  other  two  because  of  better  hardware  and  also  because  it 
allows  multiprogramming.  The  results  obtained  are  promising  and  this  scheme  could  be  used  for 
compiling  programs  for  varying  sizes  of  DM  Ms. 

Another  scheduling  algorithm  for  DMMs,  namely  Search  and  Duplication  Based  Scheduling 
(SDBS)  algorithm,  has  been  developed.  This,  algorithm  is  based  on  certain  realistic  assumptions 
and  if  they  are  satisfied,  the  algorithm  generates  an  optimal  time  schedule.  If  the  conditions  are 
satisfied  then  no  lower  execution  time  schedule  can  be  generated,  and  this  has  been  proved.  Even 
if  the  assumptions  are  not  satisfied,  the  SDBS  algorithm  generates  a  schedule  which  is  close  to 
optimal.  The  lesults  for  such  a  case  have  been  obtained  with  the  help  of  extensive  simulations. 
SDBS  algorithm  has  been  enhanced  to  generate  optimal  schedule  in  cases  where  the  task  execution 
times  are  varying.  A  task  encapsulates  control  dependencies  within  it.  Thus  for  tasks  which 
encapsulate  loops  or  if-then-else  type  of  statements,  the  execution  time  could  vary.  The  enhanced 
SDBS  algorithm  takes  such  tasks  into  consideration  and  produces  an  optimal  schedule. 

The  schedule  time  obtained  by  using  SDBS  algorithm  for  random  DAGs  was  compared  against 
the  level  of  the  entry  node.  The  level  of  the  entry  node  is  the  lowest  possible  schedule  time  because 
the  communication  times  are  neglected  when  calculating  levels.  The  schedule  time  obtained  by 
SDBS  algorithm  was  11%  over  the  absolute  lowerbound  on  the  average.  The  lowerbound  cannot 
be  achieved  in  most  of  the  cases  because  of  the  communication  time.  The  performance  of  enhanced 
SDBS  algorithm  was  compared  against  the  SDBS  algorithm  for  cases  where  the  task  execution  times 
were  varying.  For  a  small  DAG  the  enhanced  SDBS  algorithm  is  better  than  SDBS  algorithm  by 
4%,  on  the  average.  This  figure  is  low  because  this  comparison  was  performed  on  a  small  DAG  with 
only  a  few  tasks  having  varying  execution  times.  For  larger  DAGs  with  more  number  of  varying 
execution  time  tasks  we  would  expect  the  enhanced  SDBS  algorithm  to  perform  much  better  then 
SDBS  algorithm. 

3  Significant  Research  Results 

•  Introduced  a  new  concurrent  object-oriented  language  called  CORE  which  solves  the  inheri¬ 
tance  anomaly  problem. 

•  By  separating  and  localizing  the  synchronization  schems  from  the  main  bodies  of  the  program 
components,  CORE  allows  a  high  degree  of  encapsulation  and  re-use  for  the  synchronization 
code  and  the  program  modules. 

•  Introduced  a  new  compile  time  method  to  schedule  functional  parallelism  in  a  program  on 
distributed  memory  machines. 

•  Scheduling  goals  for  the  scheduling  method  can  be  either  to  reduce  execution  time  or  to 
generate  a.  schedule  to  meet  the  system  size. 

•  Concept  of  threshold  introduced  to  quantify  the  trade-off  between  the  completion  time  and 
the  degree  of  parallelism  in  the  schedule. 


•  Scheduler  incorporated  in  the  compiler  backend  for  targeting  Sisal  to  Intel  Touchstone  i860 
systems. 

•  Introduced  a  new  scheduling  scheme.  SDBS  for  scheduling  tasks  represented  in  the  form  of  a 
DAG  onto  Distributed  Memory  Machines. 

•  SDBS  algorithm  is  optimal  and  if  assumptions  are  satisfied  a  lower  execution  time  schedule 
cannot  be  generated. 

•  Enhanced  SDBS  algorithm  developed  which  takes  tasks  with  varying  execution  times  into 
account. 
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